{
 "nbformat": 4,
 "nbformat_minor": 2,
 "metadata": {
  "colab": {
   "name": "Monitor wildlife health via image classification",
   "private_outputs": true,
   "provenance": [],
   "collapsed_sections": [],
   "toc_visible": true
  },
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3",
   "language": "python"
  }
 },
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# @title ###### Licensed to the Apache Software Foundation (ASF), Version 2.0 (the \"License\")\n",
    "\n",
    "# Licensed to the Apache Software Foundation (ASF) under one\n",
    "# or more contributor license agreements. See the NOTICE file\n",
    "# distributed with this work for additional information\n",
    "# regarding copyright ownership. The ASF licenses this file\n",
    "# to you under the Apache License, Version 2.0 (the\n",
    "# \"License\"); you may not use this file except in compliance\n",
    "# with the License. You may obtain a copy of the License at\n",
    "#\n",
    "#   http://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing,\n",
    "# software distributed under the License is distributed on an\n",
    "# \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY\n",
    "# KIND, either express or implied. See the License for the\n",
    "# specific language governing permissions and limitations\n",
    "# under the License."
   ],
   "outputs": [],
   "metadata": {
    "cellView": "form",
    "id": "jKlS16hikL4S"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# Monitor wildlife health via image classification\n",
    "\n",
    "🦜🦧 🐘🦩🦋 🦇 🦀 🦔 🐇 🦫 🦏 🐆 🐗 🦙 🦥 🦎 🦃\n",
    "\n",
    "- **Time estimate**: 2 hours 15 minutes\n",
    "- **Cost estimate**: less than \\$30.00 (_free_ if you use \\$300 [Cloud credits](https://cloud.google.com/free/docs/gcp-free-tier))\n",
    "\n",
    "> [Watch the video in YouTube<br> ![thumbnail](http://img.youtube.com/vi/hUzODH3uGg0/0.jpg)](https://youtu.be/hUzODH3uGg0)\n",
    ">\n",
    "> 📜 Read more about this sample in the [blog post](https://cloud.google.com/blog/topics/developers-practitioners/recovering-global-wildlife-populations-using-ml).\n",
    "\n",
    "This is an interactive notebook that contains all of the code necessary to train an ML model for image classification.\n",
    "This model is trained to recognize animal species from\n",
    "[camera trap](https://en.wikipedia.org/wiki/Camera_trap)\n",
    "pictures."
   ],
   "metadata": {
    "id": "1LUt29D0MSeu"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 🙈 Using this interactive notebook\n",
    "\n",
    "Click the **run** icons ▶️ of each section within this notebook. \n",
    "\n",
    "This notebook code lets you train and deploy an ML model, as well as test its accuracy, from end-to-end. When you run a code cell, the code runs in the notebook's runtime, so you're not making any changes to your personal computer.\n",
    "\n",
    "> 🛎️ **To avoid any errors**, wait for each section to finish in their order before clicking the next “run” icon.\n",
    "\n",
    "This sample must be connected to a Google Cloud project, but nothing else is needed other than your Google Cloud project.\n",
    "You can use an existing project and the cost will be less than \\$30. Alternatively, you can create a new Cloud project with cloud credits for free"
   ],
   "metadata": {
    "id": "jXDRG8QK2BZD"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 🚴‍♀️ Steps included\n",
    "\n",
    "The notebook leverages camera trap images dataset from the\n",
    "[WCS Camera Traps dataset](http://lila.science/datasets/wcscameratraps)\n",
    "from the _Labeled Information Library of Alexandria: Biology and Conservation_ (LILA) organization.\n",
    " \n",
    "Here's a quick summary of what you’ll go through:\n",
    "\n",
    "1. **Create a database of image metadata** _(~5 minutes to complete, costs a few cents)_:\n",
    "   in a\n",
    "   [BigQuery](https://cloud.google.com/bigquery)\n",
    "   table with all the image file names along with their respective category. \n",
    "\n",
    "1. **Train an image classifier** _(~2 hours to complete, costs ~$25.00)_:\n",
    "   With some desired criteria, a\n",
    "   [Dataflow](https://cloud.google.com/dataflow)\n",
    "   pipeline creates a balanced dataset from the metadata database.\n",
    "   It then physically downloads the necessary images from LILA into\n",
    "   [Cloud Storage](https://cloud.google.com/storage),\n",
    "   imports the data into\n",
    "   [AI Platform Unified](https://cloud.google.com/ai-platform-unified/docs/start/introduction-unified-platform),\n",
    "   and starts the model training job.\n",
    "\n",
    "1. **Deploy an ML model** _(costs **$1.25 for every hour** the model is deployed online… so complete the “delete model” step soon)_:\n",
    "   After the model finishes training, we look at the results and deploy it into a Cloud endpoint in AI Platform Unified.\n",
    "\n",
    "1. **Classify images with the model**:\n",
    "   We send some images to the model and get back the model's classification predictions.\n",
    "\n",
    "1. (Optional) **Delete the project** to avoid ongoing costs."
   ],
   "metadata": {
    "id": "Kb9doxCpUcpF"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## ✨ Before you begin, you need to…\n",
    "\n",
    "1. Decide on creating a new\n",
    "   [free project](https://cloud.google.com/free/docs/gcp-free-tier)\n",
    "   _(recommended)_ or using an existing one.\n",
    "   Then copy the project ID and paste it in the `google_cloud_project` field in the \"Entering project details” section below.\n",
    "\n",
    "   > 💡 If you _don't plan to keep the resources_ that you create via this sample, we recommend creating a new project instead of selecting an existing project.\n",
    "   > After you finish these steps, you can delete the project, removing all the resources associated in bulk.\n",
    "\n",
    "1. [_Click here_](https://console.cloud.google.com/flows/enableapi?apiid=dataflow,aiplatform.googleapis.com)\n",
    "   to **enable the following APIs** in your Google Cloud project:\n",
    "   _Dataflow_ and _AI Platform_.\n",
    "\n",
    "1. Make sure that **billing is enabled** for your Google Cloud project,\n",
    "   [_click here_](https://cloud.google.com/billing/docs/how-to/modify-project)\n",
    "   to learn how to confirm that billing is enabled.\n",
    "\n",
    "1. [_Click here_](https://console.cloud.google.com/storage/create-bucket)\n",
    "   to create a Cloud Storage bucket.\n",
    "   Then copy the bucket’s name and paste it in the `cloud_storage_bucket` field in the “Entering project details” section below.\n",
    "\n",
    "   > 🛎️ Make sure it's a _regional_ bucket in a location where\n",
    "   > [AI Platform is available](https://cloud.google.com/ai-platform-unified/docs/general/locations#available_regions).\n",
    "\n",
    "1. [Create a BigQuery dataset](https://cloud.google.com/bigquery/docs/datasets#create-dataset).\n",
    "   Then copy the dataset ID and paste it in the `bigquery_dataset` field in the “Entering project details” section below.\n",
    "\n",
    "   > 💡 You can use the name `samples` for your dataset."
   ],
   "metadata": {
    "id": "VRZjMwyf2BZE"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### ⛏️ Preparing the project environment\n",
    "\n",
    "Click run ▶️ for the following cells to download and install the necessary libraries and resources for this solution.\n",
    "\n",
    "> 💡 You can _optionally_ view the entire\n",
    "> [code in GitHub](https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/people-and-planet-ai/image-classification)."
   ],
   "metadata": {
    "id": "5S5_eLKE-M2o"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# We need libffi-dev to launch the Dataflow pipeline.\n",
    "!apt-get -qq install libffi-dev\n",
    "\n",
    "# Clone the python-docs-samples respository.\n",
    "!git clone https://github.com/GoogleCloudPlatform/python-docs-samples.git\n",
    "\n",
    "# Navigate to the sample code directory.\n",
    "%cd python-docs-samples/people-and-planet-ai/image-classification\n",
    "\n",
    "# Install the sample requirements.\n",
    "!pip install --quiet -U pip\n",
    "!pip install -r requirements.txt"
   ],
   "outputs": [],
   "metadata": {
    "id": "zdiQJibg9iEn"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "> 🛎️ **\\[DON’T PANIC\\]** It’s safe to _ignore the warnings_.\n",
    "> When we `pip install` the requirements, there might be some warnings about conflicting dependency versions.\n",
    "> For the scope of this sample, that’s ok.\n",
    "\n",
    "> ⚠️ **Restart the runtime**: Running the previous cell just updated some libraries and requires to restart the runtime to load those libraries correctly.\n",
    ">\n",
    "> In the top-left menu, click _**\"Runtime\" > \"Restart runtime\"**_."
   ],
   "metadata": {
    "id": "j_Us7ESa90WN"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# After restarting the runtime, navigate back to the sample code directory.\n",
    "%cd python-docs-samples/people-and-planet-ai/image-classification"
   ],
   "outputs": [],
   "metadata": {
    "id": "LevEkQNG94Lx"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "### ✏️ Entering project details\n",
    "\n",
    "Fill these in with the Google Cloud project resources you just created."
   ],
   "metadata": {
    "id": "cipdLUp92BZF"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "google_cloud_project = \"\"  # @param {type:\"string\"}\n",
    "cloud_storage_bucket = \"\"  # @param {type:\"string\"}\n",
    "bigquery_dataset = \"samples\"  # @param {type:\"string\"}\n",
    "\n",
    "# Validate inputs.\n",
    "if not google_cloud_project:\n",
    "    raise ValueError(\"Please provide your google_cloud_project\")\n",
    "if not cloud_storage_bucket:\n",
    "    raise ValueError(\"Please provide your cloud_storage_bucket\")\n",
    "\n",
    "# Authenticate to use the Google Cloud resources.\n",
    "try:\n",
    "    from google.colab import auth\n",
    "\n",
    "    auth.authenticate_user()\n",
    "    print(\"Authenticated\")\n",
    "except ModuleNotFoundError:\n",
    "    import os\n",
    "\n",
    "    if os.environ.get(\"GOOGLE_APPLICATION_CREDENTIALS\") is None:\n",
    "        raise ValueError(\n",
    "            \"Please set your GOOGLE_APPLICATION_CREDENTIALS environment variable to your service account JSON file path.\"\n",
    "        )\n",
    "    print(f\"GOOGLE_APPLICATION_CREDENTIALS: {service_account_file}\")\n",
    "\n",
    "%env GOOGLE_CLOUD_PROJECT={google_cloud_project}"
   ],
   "outputs": [],
   "metadata": {
    "id": "lWfoIueZMO9P",
    "cellView": "form"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "1. Click the run button ▶️ for the cells above.\n",
    "1. Paste the \"verification code\" that is presented, and press **\\[ENTER\\]** to authenticate.\n",
    "\n",
    "Run the following cell, you can leave them with these values or change them if you prefer."
   ],
   "metadata": {
    "id": "sWFY2a_P2BZF"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "cloud_storage_directory = \"samples/wildlife-insights\"  # @param {type:\"string\"}\n",
    "bigquery_table = \"wildlife_images_metadata\"  # @param {type:\"string\"}\n",
    "ai_platform_name_prefix = \"wildlife_classifier\"  # @param {type:\"string\"}\n",
    "region = \"us-central1\"  # @param {type:\"string\"}"
   ],
   "outputs": [],
   "metadata": {
    "cellView": "form",
    "id": "arH5wwVq2BZG"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "> 🛎 Make sure the `region` matches the region you chose for your Cloud Storage bucket."
   ],
   "metadata": {
    "id": "wc5yFA9W2BZG"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 🐻 Creating the database of images metadata\n",
    "\n",
    "To build our model we need data, so we create a database table from the metadata of all the images.\n",
    "This is a _one-time only_ process.\n",
    "\n",
    "Click run on the first Dataflow pipeline to create the images metadata table in BigQuery, which contains 2 fields of information from each image from the JSON file in WCS Camera Traps database:\n",
    "\n",
    "   - `category`: The _species_ we want to predict, this is our _label_.\n",
    "   - `file_name`: The _file path_ where the image file is located.\n",
    " \n",
    "We do some very basic _data cleaning_ like discarding rows with categories that are note useful like `#ref!`, `empty`, `unidentifiable`, `unidentified`, `unknown`.\n",
    "\n",
    "> 💡 As you collect new images from the field via your own camera traps, you continue to store the new image’s category and file name into the BigQuery table.\n",
    "> You could extend this sample to collect more pertinent data for reporting purposes such as location, date, etc."
   ],
   "metadata": {
    "id": "GHs2UDDnT1YG"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# [One time only] Create the images metadata table.\n",
    "!python create_images_metadata_table.py \\\n",
    "  --bigquery-dataset \"{bigquery_dataset}\" \\\n",
    "  --bigquery-table \"{bigquery_table}\" \\\n",
    "  --runner \"DataflowRunner\" \\\n",
    "  --job_name \"wildlife-images-metadata-`date +%Y%m%d-%H%M%S`\" \\\n",
    "  --project \"{google_cloud_project}\" \\\n",
    "  --temp_location \"gs://{cloud_storage_bucket}/{cloud_storage_directory}/temp\" \\\n",
    "  --region \"{region}\" \\\n",
    "  --worker_machine_type \"n1-standard-2\""
   ],
   "outputs": [],
   "metadata": {
    "id": "ivq94L2UT6FJ"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "<button>\n",
    "\n",
    "![View in GitHub](https://www.tensorflow.org/images/GitHub-Mark-32px.png)\n",
    "[View `create_images_metadata_table.py`](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/people-and-planet-ai/image-classification/create_images_metadata_table.py)\n",
    "\n",
    "</button>\n",
    "\n",
    "> 🛎️ We need at least `n1-standard-2` [worker machines](https://cloud.google.com/compute/docs/machine-types#n1_machine_types) due to large RAM usage when parsing the metadata JSON file.\n",
    "\n",
    "You can look up the job details in the Dataflow jobs page:\n",
    "\n",
    "- [console.cloud.google.com/dataflow/jobs](https://console.cloud.google.com/dataflow/jobs)"
   ],
   "metadata": {
    "id": "p8rDiRzZcWvK"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 👟 Training the image classifier\n",
    "\n",
    "One of the challenges of a camera trap dataset is that it usually contains a _different number of images per species_, making it very _unbalanced_.\n",
    "\n",
    "For example, there are tens of thousands of pictures of some species like [tayassu pecari](https://www.google.com/search?q=tayassu+pecari&tbm=isch), while only a handful of pictures of other species like [tolypeutes matacus](https://www.google.com/search?q=tolypeutes+matacus&tbm=isch).\n",
    "\n",
    "In _emojis_, the data looks _skewed_ like this: \n",
    "\n",
    "- 🦜🦜🦜🦜🦜 (5 Green Macaws)\n",
    "- 🦧  (1 Bornean orangutan)\n",
    "- 🐘🐘🐘 (3 Sumatran elephant)\n",
    "- 🐗🐗🐗🐗🐗🐗🐗🐗🐗🐗🐗🐗 (12 wild boars)\n",
    "\n",
    "This could introduce a\n",
    "[_bias_](https://developers.google.com/machine-learning/crash-course/fairness/types-of-bias)\n",
    "into your model which can lead to it predicting a species just because it's more common rather than identifying it by it’s actual features.\n",
    "So instead of using the entire database of images, we leverage Dataflow to create a balanced dataset by specifying the _minimum_ and _maximum_ number of images we want for every category.\n",
    "For categories with too many images, it chooses at most `max_images_per_class` randomly selected images.\n",
    "And categories with less than `min_images_per_class` are discarded as _\"too little information to teach our model to classify this species\"_.\n",
    "\n",
    "> 💡 For this sample, we defaulted to using _between 50 and 100 images_ per species/class to keep the training dataset small. This reduces the training time and the _potential cost_ of prediction accuracy.\n",
    "> Feel free to play around with other numbers.\n",
    "\n",
    "Once Dataflow chooses what images meet our criteria by counting the number of species per categories in BigQuery, it\n",
    "[lazily](https://en.wikipedia.org/wiki/Lazy_evaluation)\n",
    "downloads the actual JPEG images into Cloud Storage from the Lila Science database using it’s file name.\n",
    "This means we are only preprocessing the images we need for training rather than processing the entire dataset of JPEG images.\n",
    " \n",
    "When retraining in the future, if an image already exists in Cloud Storage, there's nothing else to do for that image, as we have created a _cache_ for images that have already been processed in our prior run.\n",
    "\n",
    "After all the images are stored in Cloud Storage, Dataflow creates a\n",
    "[CSV file for AI Platform](https://cloud.google.com/ai-platform-unified/docs/datasets/prepare-image#csv).\n",
    "Each row includes the Cloud Storage path of an image, alongside with the category, our label.\n",
    "\n",
    "Next, Dataflow tells AI Platform to import the CSV file and create a dataset from it. This is where the Dataflow job stops and AI Platform begins to train the model.\n",
    " \n",
    "We let AI Platform do the _data splitting_ automatically for us. By default, it uses _80% of the data for training_, _10% for validation_, and _10% for testing_. See [About data splits for AI Platform models](https://cloud.google.com/ai-platform-unified/docs/general/ml-use) for more information.\n",
    "\n",
    "> 🛎️ For simplicity, in this sample we are training a `CLOUD` model.\n",
    "> This allows us to deploy it to an HTTP endpoint and get predictions via the browser.\n",
    ">\n",
    "> See [Train an Edge model](https://cloud.google.com/ai-platform-unified/docs/training/automl-edge-console)\n",
    "> for information on how to train a model for a microcontroller."
   ],
   "metadata": {
    "id": "I9sD-nSOmbK6"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "min_images_per_class = 50  # @param {type:\"integer\"}\n",
    "max_images_per_class = 100  # @param {type:\"integer\"}"
   ],
   "outputs": [],
   "metadata": {
    "id": "V5Y4byrzxVMC",
    "cellView": "form"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Create a balanced dataset and signal AI Platform to train a model.\n",
    "!python train_model.py \\\n",
    "  --cloud-storage-path \"gs://{cloud_storage_bucket}/{cloud_storage_directory}\" \\\n",
    "  --bigquery-dataset \"{bigquery_dataset}\" \\\n",
    "  --bigquery-table \"{bigquery_table}\" \\\n",
    "  --ai-platform-name-prefix \"{ai_platform_name_prefix}\" \\\n",
    "  --min-images-per-class \"{min_images_per_class}\" \\\n",
    "  --max-images-per-class \"{max_images_per_class}\" \\\n",
    "  --runner \"DataflowRunner\" \\\n",
    "  --job_name \"wildlife-train-model-`date +%Y%m%d-%H%M%S`\" \\\n",
    "  --project \"{google_cloud_project}\" \\\n",
    "  --temp_location \"gs://{cloud_storage_bucket}/{cloud_storage_directory}/temp\" \\\n",
    "  --requirements_file \"requirements.txt\" \\\n",
    "  --region \"{region}\""
   ],
   "outputs": [],
   "metadata": {
    "id": "BgBiq2CzQ9Cp"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "<button>\n",
    "\n",
    "![View in GitHub](https://www.tensorflow.org/images/GitHub-Mark-32px.png)\n",
    "[View `train_model.py`](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/people-and-planet-ai/image-classification/train_model.py)\n",
    "\n",
    "</button>\n",
    "\n",
    "> 🛎️ It can take several minutes for the job to show up in the\n",
    "> [Dataflow jobs page](https://console.cloud.google.com/dataflow/jobs).\n",
    "> See [[ARROW-8983]](https://issues.apache.org/jira/browse/ARROW-8983) for more information.\n",
    "\n",
    "Training the AI Platform model can take a while, depending on the dataset size and the training budget you allow.\n",
    "\n",
    "> 💡 You can adjust the training budget using the `--budget-milli-node-hours` flag. We default to `8000` which is the minimum.\n",
    ">\n",
    "> See [the pricing page](https://cloud.google.com/vision/automl/pricing) and\n",
    "> [`train_budget_milli_node_hours`](https://cloud.google.com/automl/docs/reference/rpc/google.cloud.automl.v1#imageclassificationmodelmetadata)\n",
    "> for more information."
   ],
   "metadata": {
    "id": "h_xtp4SrwLZN"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 🐣 Deploying the model\n",
    "\n",
    "> 🛎 Make sure your model has finished training.\n",
    "> You can look at the training progress on the\n",
    "> [AI Platform training jobs page](https://console.cloud.google.com/ai/platform/training/training-pipelines).\n",
    "\n",
    "> 💡 If you were disconnected from the session due to inactivity, please make sure to re-run the cells at the _\"Before you begin\"_ section at the beginning of the notebook.\n",
    "\n",
    "We need to deploy the trained model into an endpoint to get predictions from it.\n",
    "Since we don't have a micro controller readily available, let’s deploy it in the cloud as an HTTP endpoint.\n",
    "\n",
    "You can deploy it through the Cloud Console: [console.cloud.google.com/ai/platform/models](https://console.cloud.google.com/ai/platform/models)\n",
    "\n",
    "Or alternatively, you can deploy it through the API by running the following cells:"
   ],
   "metadata": {
    "id": "WgHVM2Nwxw9r"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# First we need the model path, we can get it with gcloud.\n",
    "# 💡 If you get an error here, please make sure your model has finished training.\n",
    "cmd_output = !gcloud beta ai models list \\\n",
    "  --project {google_cloud_project} \\\n",
    "  --region {region} \\\n",
    "  --filter \"display_name:{ai_platform_name_prefix}*\" \\\n",
    "  --format \"table[no-heading](display_name,name)\" 2>/dev/null\n",
    "models = sorted([line.split() for line in cmd_output])\n",
    "model_path = models[0][1]\n",
    "\n",
    "print(f\"model_path: {model_path}\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "W-YgS_IPbNfp"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Create an endpoint and deploy the model to it.\n",
    "!python deploy_model.py \\\n",
    "  --project {google_cloud_project} \\\n",
    "  --region {region} \\\n",
    "  --model-path {model_path} \\\n",
    "  --model-endpoint-name {ai_platform_name_prefix}"
   ],
   "outputs": [],
   "metadata": {
    "id": "lRRg39FcYILC"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "<button>\n",
    "\n",
    "![View in GitHub](https://www.tensorflow.org/images/GitHub-Mark-32px.png)\n",
    "[View `deploy_model.py`](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/people-and-planet-ai/image-classification/deploy_model.py)\n",
    "\n",
    "</button>"
   ],
   "metadata": {
    "id": "VQPz8h9z2BZK"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 📺 Visualizing classification of images online\n",
    "\n",
    "Now that we have a deployed model, we can classify images using it.\n",
    " \n",
    "First, lets define some functions to help us visualize and navigate the images, click run:"
   ],
   "metadata": {
    "id": "1jDpAELnxzf7"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "import io\n",
    "import requests\n",
    "from PIL import Image\n",
    "from IPython.display import display\n",
    "\n",
    "from google.cloud import bigquery\n",
    "\n",
    "\n",
    "def display_image(image_file, width=400):\n",
    "    base_url = \"https://lilablobssc.blob.core.windows.net/wcs-unzipped\"\n",
    "    image_bytes = requests.get(f\"{base_url}/{image_file}\").content\n",
    "    if b\"<Error>\" in image_bytes:\n",
    "        raise ValueError(\n",
    "            f\"Error requesting image: {base_url}/{image_file}\\n{image_bytes.decode('utf-8')}\"\n",
    "        )\n",
    "    image = Image.open(io.BytesIO(image_bytes))\n",
    "    display(image.resize((int(width), int(width / image.size[0] * image.size[1]))))\n",
    "\n",
    "\n",
    "def display_samples_for_category(category, num_samples=3, width=400):\n",
    "    client = bigquery.Client()\n",
    "    query_job = client.query(\n",
    "        f\"\"\"\n",
    "      SELECT file_name\n",
    "      FROM `{google_cloud_project}.{bigquery_dataset}.{bigquery_table}`\n",
    "      WHERE category = '{category}'\n",
    "      LIMIT {num_samples}\n",
    "  \"\"\"\n",
    "    )\n",
    "\n",
    "    for row in query_job:\n",
    "        image_file = row[\"file_name\"]\n",
    "        print(f\"{category}: {image_file}\")\n",
    "        display_image(image_file, width)"
   ],
   "outputs": [],
   "metadata": {
    "id": "6-fuHNjhHh2d"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "Now run the following cell to get the model endpoint ID, which we need to make predictions."
   ],
   "metadata": {
    "id": "AFCtx6I195DA"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# First we need the endpoint ID, we can get it with gcloud.\n",
    "stdout = !gcloud beta ai endpoints list \\\n",
    "  --project {google_cloud_project} \\\n",
    "  --region {region} \\\n",
    "  --filter \"display_name={ai_platform_name_prefix}\" \\\n",
    "  --format \"table[no-heading](ENDPOINT_ID)\" 2>/dev/null\n",
    "model_endpoint_id = stdout[0]\n",
    "\n",
    "print(f\"model_endpoint_id: {model_endpoint_id}\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "M9jn39iv6EDu"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "def predict(image_file):\n",
    "  # First, display the image, and then we run our prediction script.\n",
    "  display_image(image_file)\n",
    "  !python predict.py \\\n",
    "    --project \"{google_cloud_project}\" \\\n",
    "    --region \"{region}\" \\\n",
    "    --model-endpoint-id \"{model_endpoint_id}\" \\\n",
    "    --image-file \"{image_file}\""
   ],
   "outputs": [],
   "metadata": {
    "id": "bDogzaQ_FgNw"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "<button>\n",
    "\n",
    "![View in GitHub](https://www.tensorflow.org/images/GitHub-Mark-32px.png)\n",
    "[View `predict.py`](https://github.com/GoogleCloudPlatform/python-docs-samples/blob/main/people-and-planet-ai/image-classification/predict.py)\n",
    "\n",
    "</button>\n",
    "\n",
    "Now, run the following cells to see what our model thinks about each of the following images.\n",
    "The model gives us a list of species it predicts it is, alongside with the confidence score for each prediction."
   ],
   "metadata": {
    "id": "nScTC8Si2BZL"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Species: dicerorhinus sumatrensis\n",
    "predict(\"animals/0325/1529.jpg\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "Pxy3sMAFFcAP"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Species: didelphis imperfecta\n",
    "predict(\"animals/0667/1214.jpg\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "VF-S9qpnEqjD"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Species: tapirus indicus\n",
    "predict(\"animals/0036/0072.jpg\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "LneZo04sF_Bm"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Species: leopardus wiedii\n",
    "predict(\"animals/0000/1705.jpg\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "qJp96gNqHUwn"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Species: hemigalus derbyanus\n",
    "predict(\"animals/0036/0566.jpg\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "yl523QeUGXmR"
   }
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Species: dasypus novemcinctus\n",
    "predict(\"animals/0000/0425.jpg\")"
   ],
   "outputs": [],
   "metadata": {
    "id": "xlgjmCEQRUsS"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "🕵️ While analyzing the model’s evaluation, you may notice that many species of the same family are very similar and it may be confusing the model. This means it might be worth trying to classify the image to a family instead of a specific species, or get more images from different angles/proximity to detect it better."
   ],
   "metadata": {
    "id": "r7INPuD2I2vV"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "# 🧹 Cleaning up to avoid additional costs\n",
    "\n",
    "To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.\n",
    " \n",
    "If you wish to delete the project you created for this tutorial (it’s the easiest way to eliminate billing), here are the steps:"
   ],
   "metadata": {
    "id": "MGl_1UyiGVl6"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## Deleting the project\n",
    "\n",
    "The easiest way to eliminate billing is to delete the project that you created for the tutorial.\n",
    "\n",
    "To delete the project:\n",
    "\n",
    "> ⚠️ **Caution**: Deleting a project has the following effects:\n",
    ">\n",
    "> - **Everything in the project is deleted.** If you used an existing project for this tutorial, when you delete it, you also delete any other work you've done in the project.\n",
    "> - **Custom project IDs are lost.** When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as an `appspot.com` URL, delete selected resources inside the project instead of deleting the whole project.\n",
    ">\n",
    "> If you plan to explore multiple tutorials and quickstarts, reusing projects can help you avoid exceeding project quota limits.\n",
    "\n",
    "1. In the Cloud Console, go to the **Manage resources** page.\n",
    "\n",
    "  <button>\n",
    "\n",
    "  [Go to Manage resources](https://console.cloud.google.com/iam-admin/projects)\n",
    "\n",
    "  </button>\n",
    "\n",
    "1. In the project list, select the project that you want to delete, and then click **Delete**.\n",
    "\n",
    "1. In the dialog, type the project ID, and then click **Shut down** to delete the project.\n"
   ],
   "metadata": {
    "id": "nv7phC7WGZuT"
   }
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 🎉 Congratulations, you have successfully completed this code lab!\n",
    " \n",
    "🙌 You made it this far, congrats.\n",
    "Dive deeper by visiting the\n",
    "[code on GitHub](https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/people-and-planet-ai/image-classification).\n",
    "\n",
    "> 🛎️ Was this a meaningful project for you? If so you can you share your experience in\n",
    "> [these 2 quick questions here](https://docs.google.com/forms/d/e/1FAIpQLSeClJev12m2CGE5uwJAvfJRZe9LYUZR4nimsDjrRjvc5xBifA/viewform). "
   ],
   "metadata": {
    "id": "Em4E_RNv2BZM"
   }
  }
 ]
}
